207 research outputs found

    Setting decision thresholds when operating conditions are uncertain

    Get PDF
    [EN] The quality of the decisions made by a machine learning model depends on the data and the operating conditions during deployment. Often, operating conditions such as class distribution and misclassification costs have changed during the time since the model was trained and evaluated. When deploying a binary classifier that outputs scores, once we know the new class distribution and the new cost ratio between false positives and false negatives, there are several methods in the literature to help us choose an appropriate threshold for the classifier's scores. However, on many occasions, the information that we have about this operating condition is uncertain. Previous work has considered ranges or distributions of operating conditions during deployment, with expected costs being calculated for ranges or intervals, but still the decision for each point is made as if the operating condition were certain. The implications of this assumption have received limited attention: a threshold choice that is best suited without uncertainty may be suboptimal under uncertainty. In this paper we analyse the effect of operating condition uncertainty on the expected loss for different threshold choice methods, both theoretically and experimentally. We model uncertainty as a second conditional distribution over the actual operation condition and study it theoretically in such a way that minimum and maximum uncertainty are both seen as special cases of this general formulation. This is complemented by a thorough experimental analysis investigating how different learning algorithms behave for a range of datasets according to the threshold choice method and the uncertainty level.We thank the anonymous reviewers for their comments, which have helped to improve this paper significantly. This work has been partially supported by the EU (FEDER) and the Spanish MINECO under Grant TIN 2015-69175-C4-1-R and by Generalitat Valenciana under Grant PROMETEOII/2015/013. Jose Hernandez-Orallo was supported by a Salvador de Madariaga Grant (PRX17/00467) from the Spanish MECD for a research stay at the Leverhulme Centre for the Future of Intelligence (CFI), Cambridge, a BEST Grant (BEST/2017/045) from Generalitat Valenciana for another research stay also at the CFI and an FLI Grant RFP2-152.Ferri Ramírez, C.; Hernández-Orallo, J.; Flach, P. (2019). Setting decision thresholds when operating conditions are uncertain. Data Mining and Knowledge Discovery. 33(4):805-847. https://doi.org/10.1007/s10618-019-00613-7S805847334Adams N, Hand D (1999) Comparing classifiers when the misallocation costs are uncertain. Pattern Recognit 32(7):1139–1147Bella A, Ferri C, Hernández-Orallo J, Ramírez-Quintana MJ (2013) On the effect of calibration in classifier combination. Appl Intell 38(4):566–585Bishop C (2011) Embracing uncertainty: applied machine learning comes of age. In: Machine learning and knowledge discovery in databases. Springer, Berlin, pp 4Brier GW (1950) Verification of forecasts expressed in terms of probability. Monthly Weather Rev 78(1):1–3Dalton LA (2016) Optimal ROC-based classification and performance analysis under Bayesian uncertainty models. IEEE/ACM Trans Comput Biol Bioinform (TCBB) 13(4):719–729de Melo C, Eduardo C, Bastos Cavalcante Prudencio R (2014) Cost-sensitive measures of algorithm similarity for meta-learning. In: 2014 Brazilian conference on intelligent systems (BRACIS). IEEE, pp 7–12Dou H, Yang X, Song X, Yu H, Wu WZ, Yang J (2016) Decision-theoretic rough set: a multicost strategy. Knowl-Based Syst 91:71–83Drummond C, Holte RC (2000) Explicitly representing expected cost: an alternative to roc representation. In: Proceedings of the sixth ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, NY, USA, KDD ’00, pp 198–207Drummond C, Holte RC (2006) Cost curves: an improved method for visualizing classifier performance. Mach Learn 65(1):95–130Elkan C (2001) The foundations of cost-sensitive learning. In: Proceedings of the 17th international joint conference on artificial intelligence, vol 2. Morgan Kaufmann Publishers Inc., IJCAI’01, pp 973–978Fawcett T (2003) In vivo spam filtering: a challenge problem for KDD. ACM SIGKDD Explor. Newsl. 5(2):140–148Fawcett T (2006) An introduction to ROC analysis. Pattern Recognit Lett 27(8):861–874Fawcett T, Niculescu-Mizil A (2007) PAV and the ROC convex hull. Mach Learn 68(1):97–106Ferri C, Flach PA, Hernández-Orallo J (2017) R code for threshold choice methods with context uncertainty. https://github.com/ceferra/ThresholdChoiceMethods/tree/master/UncertaintyFlach P (2004) The many faces of ROC analysis in machine learning. In: Proceedings of the twenty-first international conference on tutorial, machine learning (ICML 2004)Flach P (2014) Classification in context: adapting to changes in class and cost distribution. In: First international workshop on learning over multiple contexts at European conference on machine learning and principles and practice of knowledge discovery in databases ECML-PKDD’2014Flach P, Matsubara ET (2007) A simple lexicographic ranker and probability estimator. In: 18th European conference on machine learning, ECML2007. Springer, pp 575–582Flach P, Hernández-Orallo J, Ferri C (2011) A coherent interpretation of AUC as a measure of aggregated classification performance. In: Proceedings of the 28th international conference on machine learning, ICML2011Guzella TS, Caminhas WM (2009) A review of machine learning approaches to spam filtering. Expert Syst Appl 36(7):10206–10222Hand D (2009) Measuring classifier performance: a coherent alternative to the area under the ROC curve. Mach Learn 77(1):103–123Hernández-Orallo J, Flach P, Ferri C (2011) Brier curves: a new cost-based visualisation of classifier performance. In: Proceedings of the 28th international conference on machine learning, ICML2011Hernández-Orallo J, Flach P, Ferri C (2012) A unified view of performance metrics: translating threshold choice into expected classification loss. J Mach Learn Res 13(1):2813–2869Hernández-Orallo J, Flach P, Ferri C (2013) ROC curves in cost space. Mach Learn 93(1):71–91Hornik K, Buchta C, Zeileis A (2009) Open-source machine learning: R meets Weka. Comput Stat 24(2):225–232Huang Y (2015) Dynamic cost-sensitive naive bayes classification for uncertain data. Int J Database Theory Appl 8(1):271–280Johnson RA, Raeder T, Chawla NV (2015) Optimizing classifiers for hypothetical scenarios. In: Pacific-Asia conference on knowledge discovery and data mining. Springer, Berlin, pp 264–276Lichman M (2013) UCI machine learning repository. http://archive.ics.uci.edu/mlLiu M, Zhang Y, Zhang X, Wang Y (2011) Cost-sensitive decision tree for uncertain data. In: Advanced data mining and applications. Springer, Berlin, pp 243–255Liu XY, Zhou ZH (2010) Learning with cost intervals. In: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 403–412Provost F, Fawcett T (2001) Robust classification for imprecise environments. Mach Learn 42(3):203–231Provost FJ, Fawcett T et al (1997) Analysis and visualization of classifier performance: comparison under imprecise class and cost distributions. KDD 97:43–48Qin B, Xia Y, Li F (2009) DTU: a decision tree for uncertain data. In: Advances in knowledge discovery and data mining. Springer, Berlin, pp 4–15Ren J, Lee SD, Chen X, Kao B, Cheng R, Cheung D (2009) Naive Bayes classification of uncertain data. In: Ninth IEEE international conference on data mining, 2009. ICDM’09. IEEE, pp 944–949Ridzuan F, Potdar V, Talevski A (2010) Factors involved in estimating cost of email spam. In: Taniar D, Gervasi O, Murgante B, Pardede E, Apduhan BO (eds) Computational science and its applications—ICCSA 2010. Springer, Berlin, pp 383–399Sakkis G, Androutsopoulos I, Paliouras G, Karkaletsis V, Spyropoulos CD, Stamatopoulos P (2003) A memory-based approach to anti-spam filtering for mailing lists. Inf Retr 6(1):49–73Tsang S, Kao B, Yip KY, Ho WS, Lee SD (2011) Decision trees for uncertain data. IEEE Trans Knowl Data Eng 23(1):64–78Wang R, Tang K (2012) Minimax classifier for uncertain costs. arXiv preprint arXiv:1205.0406Zadrozny B, Elkan C (2001a) Learning and making decisions when costs and probabilities are both unknown. In: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 204–213Zadrozny B, Elkan C (2001b) Obtaining calibrated probability estimates from decision trees and Naive Bayesian classifiers. In: Proceedings of the eighteenth international conference on machine learning (ICML 2001), pp 609–61

    A Probabilistic Framework for Non-Cheating Machine Teaching

    Full text link
    Over the past decades in the field of machine teaching, several restrictions have been introduced to avoid ‘cheating’, such as collusion-free or non-clashing teaching. However, these restrictions forbid several teaching situations that we intuitively consider natural and fair, especially those ‘changes of mind’ of the learner as more evidence is given, affecting the likelihood of concepts and ultimately their posteriors. Under a new generalised probabilistic teaching, not only do these non-cheating constraints look too narrow but we also show that the most relevant machine teaching models are particular cases of this framework: the consistency graph between concepts and elements simply becomes a joint probability distribution. We show a simple procedure that builds the witness joint distribution from the ground joint distribution. We prove a chain of relations, also with a theoretical lower bound, on the teaching dimension of the old and new models. Overall, this new setting is more general than the traditional machine teaching models, yet at the same time more intuitively capturing a less abrupt notion of non-cheating teaching.Ferri Ramírez, C.; Hernández Orallo, J.; Telle, JA. (2022). A Probabilistic Framework for Non-Cheating Machine Teaching. http://hdl.handle.net/10251/18236

    The teaching size: computable teachers and learners for universal languages

    Full text link
    [EN] The theoretical hardness of machine teaching has usually been analyzed for a range of concept languages under several variants of the teaching dimension: the minimum number of examples that a teacher needs to figure out so that the learner identifies the concept. However, for languages where concepts have structure (and hence size), such as Turing-complete languages, a low teaching dimension can be achieved at the cost of using very large examples, which are hard to process by the learner. In this paper we introduce the teaching size, a more intuitive way of assessing the theoretical feasibility of teaching concepts for structured languages. In the most general case of universal languages, we show that focusing on the total size of a witness set rather than its cardinality, we can teach all total functions that are computable within some fixed time bound. We complement the theoretical results with a range of experimental results on a simple Turing-complete language, showing how teaching dimension and teaching size differ in practice. Quite remarkably, we found that witness sets are usually smaller than the programs they identify, which is an illuminating justification of why machine teaching from examples makes sense at all.We would like to thank the anonymous referees for their helpful comments. This work was supported by the EU (FEDER) and the Spanish MINECO under grant RTI2018-094403-B-C32, and the Generalitat Valenciana PROMETEO/2019/098. This work was done while the first author visited Universitat Politecnica de Valencia and also while the third author visited University of Bergen (covered by Generalitat Valenciana BEST/2018/027 and University of Bergen). J. Hernandez-Orallo is also funded by an FLI grant RFP2-152.Telle, JA.; Hernández-Orallo, J.; Ferri Ramírez, C. (2019). The teaching size: computable teachers and learners for universal languages. Machine Learning. 108(8-9):1653-1675. https://doi.org/10.1007/s10994-019-05821-2S165316751088-9Angluin, D., & Kriķis, M. (2003). Learning from different teachers. Machine Learning, 51(2), 137–163.Balbach, F. J. (2007). Models for algorithmic teaching. Ph.D. thesis, University of Lübeck.Balbach, F. J. (2008). Measuring teachability using variants of the teaching dimension. Theoretical Computer Science, 397(1–3), 94–113.Balbach, F. J., & Zeugmann, T. (2009). Recent developments in algorithmic teaching. In Intl conf on language and automata theory and applications (pp. 1–18). Springer.Bengio, Y., Louradour, J., Collobert, R., & Weston, J. (2009). Curriculum learning. In Proceedings of the 26th annual international conference on machine learning (pp. 41–48). ACM.Biran, O., & Cotton, C. (2017). Explanation and justification in machine learning: A survey. In IJCAI-17 Workshop on explainable AI (XAI) (p. 8).Böhm, C. (1964). On a family of turing machines and the related programming language. ICC Bulletin, 3(3), 187–194.Elias, P. (1975). Universal codeword sets and representations of the integers. IEEE Transactions on Information Theory, 21(2), 194–203.Freivalds, R., Kinber, E. B., & Wiehagen, R. (1989). Inductive inference from good examples. In International workshop on analogical and inductive inference (pp. 1–17). Springer.Freivalds, R., Kinber, E. B., & Wiehagen, R. (1993). On the power of inductive inference from good examples. Theoretical Computer Science, 110(1), 131–144.Gao, Z., Ries, C., Simon, H. U., & Zilles, S. (2016). Preference-based teaching. In Conf. on learning theory (pp. 971–997).Gold, E. M. (1967). Language identification in the limit. Information and Control, 10(5), 447–474.Goldman, S. A., & Kearns, M. J. (1995). On the complexity of teaching. Journal of Computer and System Sciences, 50(1), 20–31.Goldman, S. A., & Mathias, H. D. (1993). Teaching a smart learner. In Conf. on computational learning theory (pp. 67–76).Gulwani, S., Hernández-Orallo, J., Kitzelmann, E., Muggleton, S. H., Schmid, U., & Zorn, B. (2015). Inductive programming meets the real world. Communications of the ACM, 58(11).Hernandez-Orallo, J., & Telle, J. A. (2018). Finite biased teaching with infinite concept classes. arXiv preprint. arXiv:1804.07121 .Jun, S. W. (2016). 50,000,000,000 instructions per second: Design and implementation of a 256-core brainfuck computer. Computer Science and AI Laboratory, MIT.Khan, F., Mutlu, B., & Zhu, X. (2011). How do humans teach: On curriculum learning and teaching dimension. In Advances in neural information processing systems (pp. 1449–1457).Lake, B., & Baroni, M. (2018). Generalization without systematicity: On the compositional skills of sequence-to-sequence recurrent networks. In ICML (pp. 2879–2888).Lake, B. M., Salakhutdinov, R., & Tenenbaum, J. B. (2015). Human-level concept learning through probabilistic program induction. Science, 350(6266), 1332–1338.Lázaro-Gredilla, M., Lin, D., Guntupalli, J. S., & George, D. (2019). Beyond imitation: Zero-shot task transfer on robots by learning concepts as cognitive programs. Science Robotics 4.Levin, L. A. (1973). Universal Search Problems. Problems of Information Transmission, 9, 265–266.Li, M., & Vitányi, P. (2008). An introduction to Kolmogorov complexity and its applications (3rd ed.). New York, NY: Springer.Lieberman, H. (2001). Your wish is my command: Programming by example. San Francisco, CA: Morgan Kaufmann.Shafto, P., Goodman, N. D., & Griffiths, T. L. (2014). A rational account of pedagogical reasoning: Teaching by, and learning from, examples. Cognitive Psychology, 71, 55–89.Shinohara, A., & Miyano, S. (1991). Teachability in computational learning. New Generation Computing, 8(4), 337–347.Simard, P. Y., Amershi, S., Chickering, D. M., Pelton, A. E., Ghorashi, S., Meek, C., Ramos, G., Suh, J., Verwey, J., & Wang, M., et al. (2017). Machine teaching: A new paradigm for building machine learning systems. arXiv preprint arXiv:1707.06742 .Solomonoff, R. J. (1964). A formal theory of inductive inference. Part I. Information and Control, 7(1), 1–22.Valiant, L. G. (1984). A theory of the learnable. Communications of the ACM, 27(11), 1134–1142.Vapnik, V. N., & Chervonenkis, A. Y. (1971). On the uniform convergence of relative frequencies of events to their probabilities. Theory of Probability and Its Applications, 16, 264–280.Zhu, X. (2013). Machine teaching for Bayesian learners in the exponential family. In Neural information processing systems 26, Curran (pp. 1905–1913).Zhu, X. (2015). Machine teaching: An inverse problem to machine learning and an approach toward optimal education. In AAAI (pp. 4083–4087).Zhu, X., Singla, A., Zilles, S., & Rafferty, A. N. (2018). An overview of machine teaching. arXiv preprint arXiv:1801.05927

    Cycling network projects: a decision-making aid approach

    Full text link
    Effcient and clean urban mobility is a key factor in quality of life and sustainability of towns and cities. Traditionally, cities have focused on cars and other fuel-based vehicles as transport means. However, several problems are directly linked to massive car use, particularly in terms of air pollution and traffc congestion. Several works reckon that vehicle emissions produce over 90% of air pollution. One way to reduce the use of fuel-based vehicles (and thus the emission of pollutants) is to create effcient, easily accessible and secure bike lane networks which, as many studies show, promote cycling as a major mean of conveyance. In this regard, this paper presents an approach to design and calculate bike lane networks based on the use of open data about the historical use of a urban bike rental services. Concretely, we model this task as a network design problem (NDP) and we study four di erent optimisation strategies to solve it. We test these methods using data of the city of Valencia (Spain). Our experiments conclude that an optimisation approach based on genetic programming obtains the best performance. The proposed method can be easily used to improve or extend bike lane networks based on historic bike use data in other cities.This work has been partially supported by the EU (FEDER) and Spanish MINECO grant TIN2015-69175-C4-1-R, and the REFRAME project, granted by the European Coordinated Research on Long-term Challenges in Information and Communication Sciences Technologies ERA-Net (CHIST-ERA), and funded by MINECO in Spain (PCIN-2013-037), by Generalitat Valenciana PROMETEOII/2015/013, and by the French National Research agency (ANR).Martínez Plumed, F.; Ferri Ramírez, C.; Contreras Ochando, L. (2016). Cycling network projects: a decision-making aid approach. CEUR Workshop Proceedings. http://hdl.handle.net/10251/87734

    Learning with con gurable operators and RL-based heuristics

    Full text link
    In this paper, we push forward the idea of machine learning systems for which the operators can be modi ed and netuned for each problem. This allows us to propose a learning paradigm where users can write (or adapt) their operators, according to the problem, data representation and the way the information should be navigated. To achieve this goal, data instances, background knowledge, rules, programs and operators are all written in the same functional language, Erlang. Since changing operators a ect how the search space needs to be explored, heuristics are learnt as a result of a decision process based on reinforcement learning where each action is de ned as a choice of operator and rule. As a result, the architecture can be seen as a `system for writing machine learning systems' or to explore new operators.This work was supported by the MEC projects CONSOLIDER-INGENIO 26706 and TIN 2010-21062-C02-02, GVA project PROMETEO/2008/051, and the REFRAME project granted by the European Coordinated Research on Long-term Challenges in Information and Communication Sciences & Technologies ERA-Net (CHIST-ERA), and funded by the Ministerio de Econom´ıa y Competitividad in Spain. Also, F. Mart´ınez-Plumed is supported by FPI-ME grant BES-2011-045099Martínez Plumed, F.; Ferri Ramírez, C.; Hernández Orallo, J.; Ramírez Quintana, MJ. (2013). Learning with con gurable operators and RL-based heuristics. En New Frontiers in Mining Complex Patterns. Springer Verlag (Germany). 7765:1-16. https://doi.org/10.1007/978-3-642-37382-4_1S1167765Armstrong, J.: A history of erlang. In: Proceedings of the Third ACM SIGPLAN Conf. on History of Programming Languages, HOPL III, pp. 1–26. ACM (2007)Brazdil, P., Giraud-Carrier: Metalearning: Concepts and systems. In: Metalearning. Cognitive Technologies, pp. 1–10. Springer, Heidelberg (2009)Daumé III, H., Langford, J.: Search-based structured prediction (2009)Dietterich, T., Domingos, P., Getoor, L., Muggleton, S., Tadepalli, P.: Structured machine learning: the next ten years. Machine Learning 73, 3–23 (2008)Dietterich, T.G., Lathrop, R., Lozano-Perez, T.: Solving the multiple-instance problem with axis-parallel rectangles. Artificial Intelligence 89, 31–71 (1997)Džeroski, S.: Towards a general framework for data mining. In: Džeroski, S., Struyf, J. (eds.) KDID 2006. LNCS, vol. 4747, pp. 259–300. Springer, Heidelberg (2007)Dzeroski, S., De Raedt, L., Driessens, K.: Relational reinforcement learning. Machine Learning 43, 7–52 (2001), 10.1023/A:1007694015589Dzeroski, S., Lavrac, N. (eds.): Relational Data Mining. Springer (2001)Estruch, V., Ferri, C., Hernández-Orallo, J., Ramírez-Quintana, M.J.: Similarity functions for structured data. an application to decision trees. Inteligencia Artificial, Revista Iberoamericana de Inteligencia Artificial 10(29), 109–121 (2006)Estruch, V., Ferri, C., Hernández-Orallo, J., Ramírez-Quintana, M.J.: Web categorisation using distance-based decision trees. ENTCS 157(2), 35–40 (2006)Estruch, V., Ferri, C., Hernández-Orallo, J., Ramírez-Quintana, M.J.: Bridging the Gap between Distance and Generalisation. Computational Intelligence (2012)Ferri-Ramírez, C., Hernández-Orallo, J., Ramírez-Quintana, M.J.: Incremental learning of functional logic programs. In: Kuchen, H., Ueda, K. (eds.) FLOPS 2001. LNCS, vol. 2024, pp. 233–247. Springer, Heidelberg (2001)Gärtner, T.: Kernels for Structured Data. PhD thesis, Universitat Bonn (2005)Holland, J.H., Booker, L.B., Colombetti, M., Dorigo, M., Goldberg, D.E., Forrest, S., Riolo, R.L., Smith, R.E., Lanzi, P.L., Stolzmann, W., Wilson, S.W.: What is a learning classifier system? In: Lanzi, P.L., Stolzmann, W., Wilson, S.W. (eds.) IWLCS 1999. LNCS (LNAI), vol. 1813, pp. 3–32. Springer, Heidelberg (2000)Holmes, J.H., Lanzi, P., Stolzmann, W.: Learning classifier systems: New models, successful applications. Information Processing Letters (2002)Kitzelmann, E.: Inductive programming: A survey of program synthesis techniques. In: Schmid, U., Kitzelmann, E., Plasmeijer, R. (eds.) AAIP 2009. LNCS, vol. 5812, pp. 50–73. Springer, Heidelberg (2010)Koller, D., Sahami, M.: Hierarchically classifying documents using very few words. In: Proceedings of the Fourteenth International Conference on Machine Learning, ICML 1997, pp. 170–178. Morgan Kaufmann Publishers Inc., San Francisco (1997)Lafferty, J., McCallum, A.: Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: ICML 2001, pp. 282–289 (2001)Lloyd, J.W.: Knowledge representation, computation, and learning in higher-order logic (2001)Maes, F., Denoyer, L., Gallinari, P.: Structured prediction with reinforcement learning. Machine Learning Journal 77(2-3), 271–301 (2009)Martínez-Plumed, F., Estruch, V., Ferri, C., Hernández-Orallo, J., Ramírez-Quintana, M.J.: Newton trees. In: Li, J. (ed.) AI 2010. LNCS, vol. 6464, pp. 174–183. Springer, Heidelberg (2010)Muggleton, S.: Inverse entailment and Progol. New Generation Computing (1995)Muggleton, S.H.: Inductive logic programming: Issues, results, and the challenge of learning language in logic. Artificial Intelligence 114(1-2), 283–296 (1999)Plotkin, G.: A note on inductive generalization. Machine Intelligence 5 (1970)Schmidhuber, J.: Optimal ordered problem solver. Maching Learning 54(3), 211–254 (2004)Srinivasan, A.: The Aleph Manual (2004)Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press (1998)Tadepalli, P., Givan, R., Driessens, K.: Relational reinforcement learning: An overview. In: Proc. of the Workshop on Relational Reinforcement Learning (2004)Tamaddoni-Nezhad, A., Muggleton, S.: A genetic algorithms approach to ILP. In: Matwin, S., Sammut, C. (eds.) ILP 2002. LNCS (LNAI), vol. 2583, pp. 285–300. Springer, Heidelberg (2003)Tsochantaridis, I., Hofmann, T., Joachims, T., Altun, Y.: Support vector machine learning for interdependent and structured output spaces. In: ICML (2004)Wallace, C.S., Dowe, D.L.: Refinements of MDL and MML coding. Comput. J. 42(4), 330–337 (1999)Watkins, C., Dayan, P.: Q-learning. Machine Learning 8, 279–292 (1992

    A computational analysis of general intelligence tests for evaluating cognitive development

    Full text link
    [EN] The progression in several cognitive tests for the same subjects at different ages provides valuable information about their cognitive development. One question that has caught recent interest is whether the same approach can be used to assess the cognitive development of artificial systems. In particular, can we assess whether the fluid or crystallised intelligence of an artificial cognitive system is changing during its cognitive development as a result of acquiring more concepts? In this paper, we address several IQ tests problems (odd-one-out problems, Raven s Progressive Matrices and Thurstone s letter series) with a general learning system that is not particularly designed on purpose to solve intelligence tests. The goal is to better understand the role of the basic cognitive perational constructs (such as identity, difference, order, counting, logic, etc.) that are needed to solve these intelligence test problems and serve as a proof-of-concept for evaluation in other developmental problems. From here, we gain some insights into the characteristics and usefulness of these tests and how careful we need to be when applying human test problems to assess the abilities and cognitive development of robots and other artificial cognitive systems.This work has been partially supported by the EU (FEDER) and the Spanish MINECO under grants TIN 2015-69175-C4-1-R and TIN 2013-45732-C4-1-P, and by Generalitat Valenciana under grant PROMETEOII/2015/013.Martínez-Plumed, F.; Ferri Ramírez, C.; Hernández-Orallo, J.; Ramírez Quintana, MJ. (2017). A computational analysis of general intelligence tests for evaluating cognitive development. Cognitive Systems Research. 43:100-118. https://doi.org/10.1016/j.cogsys.2017.01.006S1001184

    Aggregative quantification for regression

    Full text link
    The final publication is available at Springer via http://dx.doi.org/10.1007/s10618-013-0308-zThe problem of estimating the class distribution (or prevalence) for a new unlabelled dataset (from a possibly different distribution) is a very common problem which has been addressed in one way or another in the past decades. This problem has been recently reconsidered as a new task in data mining, renamed quantification when the estimation is performed as an aggregation (and possible adjustment) of a single-instance supervised model (e.g., a classifier). However, the study of quantification has been limited to classification, while it is clear that this problem also appears, perhaps even more frequently, with other predictive problems, such as regression. In this case, the goal is to determine a distribution or an aggregated indicator of the output variable for a new unlabelled dataset. In this paper, we introduce a comprehensive new taxonomy of quantification tasks, distinguishing between the estimation of the whole distribution and the estimation of some indicators (summary statistics), for both classification and regression. This distinction is especially useful for regression, since predictions are numerical values that can be aggregated in many different ways, as in multi-dimensional hierarchical data warehouses. We focus on aggregative quantification for regression and see that the approaches borrowed from classification do not work. We present several techniques based on segmentation which are able to produce accurate estimations of the expected value and the distribution of the output variable. We show experimentally that these methods especially excel for the relevant scenarios where training and test distributions dramatically differ.We would like to thank the anonymous reviewers for their careful reviews, insightful comments and very useful suggestions. This work was supported by the MEC/MINECO projects CONSOLIDER-INGENIO CSD2007-00022 and TIN 2010-21062-C02-02, GVA project PROME-TEO/2008/051, the COST-European Cooperation in the field of Scientific and Technical Research IC0801 AT, and the REFRAME project granted by the European Coordinated Research on Long-term Challenges in Information and Communication Sciences & Technologies ERA-Net (CHIST-ERA), and funded by the Ministerio de Economia y Competitividad in Spain.Bella Sanjuán, A.; Ferri Ramírez, C.; Hernández Orallo, J.; Ramírez Quintana, MJ. (2014). Aggregative quantification for regression. Data Mining and Knowledge Discovery. 28(2):475-518. https://doi.org/10.1007/s10618-013-0308-zS475518282Alonzo TA, Pepe MS, Lumley T (2003) Estimating disease prevalence in two-phase studies. Biostatistics 4(2):313–326Anderson T (1962) On the distribution of the two-sample Cramer–von Mises criterion. Ann Math Stat 33(3):1148–1159Bakar AA, Othman ZA, Shuib NLM (2009) Building a new taxonomy for data discretization techniques. In: Proceedings of 2nd conference on data mining and optimization (DMO’09), pp 132–140Bella A, Ferri C, Hernández-Orallo J, Ramírez-Quintana MJ (2009a) Calibration of machine learning models. In: Handbook of research on machine learning applications. IGI Global, HersheyBella A, Ferri C, Hernández-Orallo J, Ramírez-Quintana MJ (2009b) Similarity-binning averaging: a generalisation of binning calibration. In: International conference on intelligent data engineering and automated learning. LNCS, vol 5788. Springer, Berlin, pp 341–349Bella A, Ferri C, Hernández-Orallo J, Ramírez-Quintana MJ (2010) Quantification via probability estimators. In: International conference on data mining, ICDM2010, pp 737–742Bella A, Ferri C, Hernández-Orallo J, Ramírez-Quintana MJ (2012) On the effect of calibration in classifier combination. Appl Intell. doi: 10.1007/s10489-012-0388-2Chan Y, Ng H (2006) Estimating class priors in domain adaptation for word sense disambiguation. In: Proceedings of the 21st international conference on computational linguistics and the 44th annual meeting of the Association for Computational Linguistics, pp 89–96Chawla N, Japkowicz N, Kotcz A (2004) Editorial: special issue on learning from imbalanced data sets. ACM SIGKDD Explor Newsl 6(1):1–6Demsar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30Dougherty J, Kohavi R, Sahami M (1995) Supervised and unsupervised discretization of continuous features. In: Prieditis A, Russell S (eds) Proceedings of the twelfth international conference on machine learning. Morgan Kaufmann, San Francisco, pp 194–202Ferri C, Hernández-Orallo J, Modroiu R (2009) An experimental comparison of performance measures for classification. Pattern Recogn Lett 30(1):27–38Flach P (2012) Machine learning: the art and science of algorithms that make sense of data. Cambridge University Press, CambridgeForman G (2005) Counting positives accurately despite inaccurate classification. In: Proceedings of the 16th European conference on machine learning (ECML), pp 564–575Forman G (2006) Quantifying trends accurately despite classifier error and class imbalance. In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining (KDD), pp 157–166Forman G (2008) Quantifying counts and costs via classification. Data Min Knowl Discov 17(2):164–206Frank A, Asuncion A (2010) UCI machine learning repository. http://archive.ics.uci.edu/mlGonzález-Castro V, Alaiz-Rodríguez R, Alegre E (2012) Class distribution estimation based on the Hellinger distance. Inf Sci 218(1):146–164Hastie TJ, Tibshirani R, Friedman J (2009) The elements of statistical learning: data mining, inference, and prediction. Springer, BerlinHernández-Orallo J, Flach P, Ferri C (2012) A unified view of performance metrics: translating threshold choice into expected classification loss. J Mach Learn Res (JMLR) 13:2813–2869Hodges J, Lehmann E (1963) Estimates of location based on rank tests. Ann Math Stat 34(5):598–611Hosmer DW, Lemeshow S (2000) Applied logistic regression. Wiley, New YorkHwang JN, Lay SR, Lippman A (1994) Nonparametric multivariate density estimation: a comparative study. IEEE Trans Signal Process 42(10):2795–2810Hyndman RJ, Bashtannyk DM, Grunwald GK (1996) Estimating and visualizing conditional densities. J Comput Graph Stat 5(4):315–336Moreno-Torres J, Raeder T, Alaiz-Rodríguez R, Chawla N, Herrera F (2012) A unifying view on dataset shift in classification. Pattern Recogn 45(1):521–530Neyman J (1938) Contribution to the theory of sampling human populations. J Am Stat Assoc 33(201):101–116Platt JC (1999) Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In: Advances in large margin classifiers. MIT Press, Cambridge, pp 61–74Raeder T, Forman G, Chawla N (2012) Learning from imbalanced data: evaluation matters. Data Min 23:315–331Sánchez L, González V, Alegre E, Alaiz R (2008) Classification and quantification based on image analysis for sperm samples with uncertain damaged/intact cell proportions. In: Proceedings of the 5th international conference on image analysis and recognition. LNCS, vol 5112. Springer, Heidelberg, pp 827–836Sturges H (1926) The choice of a class interval. J Am Stat Assoc 21(153):65–66Team R et al (2012) R: a language and environment for statistical computing. R Foundation for Statistical Computing, ViennaTenenbein A (1970) A double sampling scheme for estimating from binomial data with misclassifications. J Am Stat Assoc 65(331):1350–1361Weiss G (2004) Mining with rarity: a unifying framework. ACM SIGKDD Explor Newsl 6(1):7–19Weiss G, Provost F (2001) The effect of class distribution on classifier learning: an empirical study. Technical Report ML-TR-44Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques with Java implementations. Elsevier, AmsterdamXiao Y, Gordon A, Yakovlev A (2006a) A C++ program for the Cramér–von Mises two-sample test. J Stat Softw 17:1–15Xiao Y, Gordon A, Yakovlev A (2006b) The L1-version of the Cramér-von Mises test for two-sample comparisons in microarray data analysis. EURASIP J Bioinform Syst Biol 2006:85769Xue J, Weiss G (2009) Quantification and semi-supervised classification methods for handling changes in class distribution. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining, pp 897–906Yang Y (2003) Discretization for naive-bayes learning. PhD thesis, Monash UniversityZadrozny B, Elkan C (2001) Obtaining calibrated probability estimates from decision trees and naive bayesian classifiers. In: Proceedings of the 8th international conference on machine learning (ICML), pp 609–616Zadrozny B, Elkan C (2002) Transforming classifier scores into accurate multiclass probability estimates. In: The 8th ACM SIGKDD international conference on knowledge discovery and data mining, pp 694–69

    Missing the missing values: The ugly duckling of fairness in machine learning

    Full text link
    [EN] Nowadays, there is an increasing concern in machine learning about the causes underlying unfair decision making, that is, algorithmic decisions discriminating some groups over others, especially with groups that are defined over protected attributes, such as gender, race and nationality. Missing values are one frequent manifestation of all these latent causes: protected groups are more reluctant to give information that could be used against them, sensitive information for some groups can be erased by human operators, or data acquisition may simply be less complete and systematic for minority groups. However, most recent techniques, libraries and experimental results dealing with fairness in machine learning have simply ignored missing data. In this paper, we present the first comprehensive analysis of the relation between missing values and algorithmic fairness for machine learning: (1) we analyse the sources of missing data and bias, mapping the common causes, (2) we find that rows containing missing values are usually fairer than the rest, which should discourage the consideration of missing values as the uncomfortable ugly data that different techniques and libraries for handling algorithmic bias get rid of at the first occasion, (3) we study the trade-off between performance and fairness when the rows with missing values are used (either because the technique deals with them directly or by imputation methods), and (4) we show that the sensitivity of six different machine-learning techniques to missing values is usually low, which reinforces the view that the rows with missing data contribute more to fairness through the other, nonmissing, attributes. We end the paper with a series of recommended procedures about what to do with missing data when aiming for fair decision making.Ministerio de Economia, Industria y Competitividad, Gobierno de Espana (ES), Grant/Award Number: RTI2018-094403-B-C3; Generalitat Valenciana, Grant/Award Number: PROMETEO/2019/09; Future of Life Institute, Grant/Award Number: RFP2-15; European Commission, Grant/Award Number: DG JRC - HUMAINT projectMartínez-Plumed, F.; Ferri Ramírez, C.; Nieves, D.; Hernández-Orallo, J. (2021). Missing the missing values: The ugly duckling of fairness in machine learning. International Journal of Intelligent Systems. 36(7):3217-3258. https://doi.org/10.1002/int.22415S3217325836

    Report of the First International Workshop on Learning over Multiple Contexts (LMCE 2014)

    Full text link
    © ACM 2015. This is the author's version of the work. It is posted here for your personal use. Not for redistribution. The definitive Version of Record was published in ACM SIGKDD Explorations Newsletter http://dx.doi.org/10.1145/2830544.2830551The first international workshop on Learning over Multiple Contexts, devoted to generalization and reuse of machine learning models over multiple contexts, was held on September 19th, 2014, as part of the 7th European machine learning and data mining conference (ECML-PKDD 2014) in Nancy, France. This short report summarizes the presentations and discussions held during the LMCE 2014 workshop, as well as the workshop conclusions and the future agenda.Ferri Ramírez, C.; Flach, P.; Lachiche, N. (2015). Report of the First International Workshop on Learning over Multiple Contexts (LMCE 2014). ACM SIGKDD Explorations Newsletter. 17(1):48-50. doi:10.1145/2830544.2830551S485017
    • …
    corecore